翻訳と辞書
Words near each other
・ K-LEE Radio
・ K-Liber
・ K-Line
・ K-line
・ K-line (artificial intelligence)
・ K-line (spectrometry)
・ K-Lite Codec Pack
・ K-Lo
・ K-Love
・ K-Lyinn
・ K-Machines
・ K-Man
・ K-Mart Disco
・ K-means clustering
・ K-means++
K-medians clustering
・ K-medoids
・ K-Meleon
・ K-mer
・ K-Michel Parandi
・ K-Mil
・ K-Mile Air
・ K-minimum spanning tree
・ K-Minus Initiative
・ K-Mix 2
・ K-Much
・ K-Narias
・ K-nearest neighbors algorithm
・ K-NET Ghana Limited
・ K-NFB Reader


Dictionary Lists
翻訳と辞書 辞書検索 [ 開発暫定版 ]
スポンサード リンク

K-medians clustering : ウィキペディア英語版
K-medians clustering

In statistics and data mining, ''k''-medians clustering〔A. K. Jain and R. C. Dubes, ''Algorithms for Clustering Data''. Prentice-Hall, 1988.〕〔P. S. Bradley, O. L. Mangasarian, and W. N. Street, "Clustering via Concave Minimization," in Advances in Neural Information Processing Systems, vol. 9, M. C. Mozer, M. I. Jordan, and T. Petsche, Eds. Cambridge, MA: MIT Press, 1997, pp. 368–374.〕 is a cluster analysis algorithm. It is a variation of ''k''-means clustering where instead of calculating the mean for each cluster to determine its centroid, one instead calculates the median. This has the effect of minimizing error over all clusters with respect to the 1-norm distance metric, as opposed to the square of the 2-norm distance metric (which ''k''-means does.)
This relates directly to the ''k''-median problem which is the problem of finding ''k'' centers such that the clusters formed by them are the most compact. Formally, given a set of data points ''x'', the ''k'' centers ''c''''i'' are to be chosen so as to minimize the sum of the distances from each ''x'' to the nearest ''c''''i''.
The criterion function formulated in this way is sometimes a better criterion than that used in the ''k''-means clustering algorithm, in which the sum of the squared distances is used. The sum of distances is widely used in applications such as facility location.
The proposed algorithm uses Lloyd-style iteration which alternates between an expectation (E) and maximization (M) step, making this an Expectation–maximization algorithm. In the E step, all objects are assigned to their nearest median. In the M step, the medians are recomputed by using the median in each single dimension.
== Medians and medoids ==

The median is computed in each single dimension in the Manhattan-distance formulation of the ''k''-medians problem, so the individual attributes will come from the dataset. This makes the algorithm more reliable for discrete or even binary data sets. In contrast, the use of means or Euclidean-distance medians will not necessarily yield individual attributes from the dataset. Even with the Manhattan-distance formulation, the individual attributes may come from different instances in the dataset; thus, the resulting median may not be a member of the input dataset.
This algorithm is often confused with the ''k''-medoids algorithm. However, a medoid has to be an actual instance from the dataset, while for the multivariate Manhattan-distance median this only holds for single attribute values. The actual median can thus be a combination of multiple instances. For example, given the vectors (0,1), (1,0) and (2,2), the Manhattan-distance median is (1,1), which does not exist in the original data, and thus cannot be a medoid.

抄文引用元・出典: フリー百科事典『 ウィキペディア(Wikipedia)
ウィキペディアで「K-medians clustering」の詳細全文を読む



スポンサード リンク
翻訳と辞書 : 翻訳のためのインターネットリソース

Copyright(C) kotoba.ne.jp 1997-2016. All Rights Reserved.